15 research outputs found

    Towards Practical Control of Singular Values of Convolutional Layers

    Full text link
    In general, convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control. Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties and offered several methods for controlling them. Nevertheless, these methods present an intractable computational challenge or resort to coarse approximations. In this paper, we offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity. Our method is based on the tensor-train decomposition; it retains control over the actual singular values of convolutional mappings while providing structurally sparse and hardware-friendly representation. We demonstrate the improved properties of modern CNNs with our method and analyze its impact on the model performance, calibration, and adversarial robustness. The source code is available at: https://github.com/WhiteTeaDragon/practical_svd_convComment: Published as a conference paper at NeurIPS 202

    TT-NF: Tensor Train Neural Fields

    Full text link
    Learning neural fields has been an active topic in deep learning research, focusing, among other issues, on finding more compact and easy-to-fit representations. In this paper, we introduce a novel low-rank representation termed Tensor Train Neural Fields (TT-NF) for learning neural fields on dense regular grids and efficient methods for sampling from them. Our representation is a TT parameterization of the neural field, trained with backpropagation to minimize a non-convex objective. We analyze the effect of low-rank compression on the downstream task quality metrics in two settings. First, we demonstrate the efficiency of our method in a sandbox task of tensor denoising, which admits comparison with SVD-based schemes designed to minimize reconstruction error. Furthermore, we apply the proposed approach to Neural Radiance Fields, where the low-rank structure of the field corresponding to the best quality can be discovered only through learning.Comment: Preprint, under revie

    Breathing New Life into 3D Assets with Generative Repainting

    Full text link
    Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators. Broad adoption of these models is due to significant improvement in the quality of generations and efficient conditioning on various modalities, not just text. However, lifting the rich generative priors of these 2D models into 3D is challenging. Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields. We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools and demonstrate their ability to work together in a non-learned fashion. Such modularity has the intrinsic advantage of eased partial upgrades, which became an important property in such a fast-paced domain. Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools, and outputs a painted input geometry in several formats. We conduct a large-scale study on a wide range of objects and categories from the ShapeNetSem dataset and demonstrate the advantages of our approach, both qualitatively and quantitatively. Project page: https://www.obukhov.ai/repainting_3d_asset

    Learning to Relate Depth and Semantics for Unsupervised Domain Adaptation

    Full text link
    We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting. Semantic segmentation and monocular depth estimation are shown to be complementary tasks; in a multi-task learning setting, a proper encoding of their relationships can further improve performance on both tasks. Motivated by this observation, we propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions. To capture the cross-task relationships, we propose a neural network architecture that contains task-specific and cross-task refinement heads. Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain. We experimentally observe improvements in both tasks' performance because the complementary information present in these tasks is better captured. Specifically, we show that: (1) our approach improves performance on all tasks when they are complementary and mutually dependent; (2) the CTRL helps to improve both semantic segmentation and depth estimation tasks performance in the challenging UDA setting; (3) the proposed ISL training scheme further improves the semantic segmentation performance. The implementation is available at https://github.com/susaha/ctrl-uda.Comment: Accepted at CVPR 2021; updated results according to the released source cod

    DiffDreamer: Consistent Single-view Perpetual View Generation with Conditional Diffusion Models

    Full text link
    Perpetual view generation -- the task of generating long-range novel views by flying into a given image -- has been a novel yet promising task. We introduce DiffDreamer, an unsupervised framework capable of synthesizing novel views depicting a long camera trajectory while training solely on internet-collected images of nature scenes. We demonstrate that image-conditioned diffusion models can effectively perform long-range scene extrapolation while preserving both local and global consistency significantly better than prior GAN-based methods. Project page: https://primecai.github.io/diffdreamer

    Quantum Imaging with Incoherently Scattered Light from a Free-Electron Laser

    Full text link
    The advent of accelerator-driven free-electron lasers (FEL) has opened new avenues for high-resolution structure determination via diffraction methods that go far beyond conventional x-ray crystallography methods. These techniques rely on coherent scattering processes that require the maintenance of first-order coherence of the radiation field throughout the imaging procedure. Here we show that higher-order degrees of coherence, displayed in the intensity correlations of incoherently scattered x-rays from an FEL, can be used to image two-dimensional objects with a spatial resolution close to or even below the Abbe limit. This constitutes a new approach towards structure determination based on incoherent processes, including Compton scattering, fluorescence emission or wavefront distortions, generally considered detrimental for imaging applications. Our method is an extension of the landmark intensity correlation measurements of Hanbury Brown and Twiss to higher than second-order paving the way towards determination of structure and dynamics of matter in regimes where coherent imaging methods have intrinsic limitations

    Tensor Decompositions in Deep Learning

    No full text
    Tensor Decompositions is a subdomain of multilinear algebra concerned with dimensionality reduction and analysis of multi-dimensional arrays (tensors). The field has numerous applications in physics, chemistry, life sciences, and recently, machine learning, computer vision, and graphics. Despite the maturity of the field, much progress happened in the last years due to affordable parallel compute, driving empirical research. Deep Learning is a young subdomain of machine learning concerned with fitting deep, non-linear parametric models in a non-convex optimization setting with abundant data. The tipping point of interest in deep learning happened when a neural network (AlexNet) set a record-high score on a popular image classification benchmark (ImageNet), thus promising to solve long-standing computer vision problems. Over the past years, most breakthroughs in deep learning happened by finding smarter ways to increase model size and complexity. However, the need to deploy deep models on edge devices, such as for computational photography on mobile phones, has set a new direction for finding lean models. On the other hand, many high-potential deep learning techniques, such as Neural Radiance Fields (NeRF), or vision transformers, leave a huge margin for improvement upon inception. In this thesis, we investigate the use of tensor decompositions in the context of modern deep learning techniques. We aim to improve various types of efficiency: memory footprint and runtime performance, measured in parameters and floating-point operations (FLOPs), respectively. We begin by exploring neural network layer compression schemes and propose a tensorized representation with a basis tensor shared among layers and per-layer coefficients. Subsequently, we study the manifold of Tensor Train (TT) of fixed rank in the context of parameterizing layers of Generative Adversarial Networks (GANs) and demonstrate the ability to compress networks while maintaining the stability of training. Finally, we utilize TT-parameterization to learn compressed NeRFs and devise sampling schemes with support for automatic differentiation to facilitate training. Unlike most previous works on tensor decompositions, we treat decompositions as models in the deep learning sense and update their parameters through backpropagation and optimization. Like prior art, tensorized formats admit to certain algebraic operations, making them an appealing entity at the intersection of two prominent research directions
    corecore